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Abstract 

Detecting and classifying leaf diseases in cashew crops is critical for farm- 
ers to find pest and disease infections. Cashew leaf diseases can reduce pro- 
ductivity if not detected early. Creating an automated method utilizing image 
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K rds: : : : ; : . : 
”y vs ds processing for leaf disease identification decreases time and expense and pri- 
a sad Leaf, marily contributes to a rise in cashew nut yield. For image segmentation, canny 


Cantons Deieeion: edge detection and an active contour model are utilized. A feature extraction 

Machine Learning Tech- method, Principal Component Analysis (PCA), is applied when the contour 

nique has been applied. After the features have been extracted, they are submitted 
for categorization. This study analyzed several classifiers’ accuracy, preci- 
sion, and recall values. These classifiers included Random Forest, SVM, KNN, 
and Naive Bayes. This research tries to answer whether a machine learning 
classifier provides the best results when the diseased area is divided using the 
canny edge detection and contour detection technique. 


struct an accurate model (Greeshma, Balakrishnan, 
et al. Tulshan, Raul, et al.). Diseases that affect 
crop leaves can range in size, shape, and appear- 
ance. Some diseases have the same color but dif- 
ferent shapes, while others have different shapes 
but the same color. Yet others have different col- 
ors but the same form. Machine learning methods 
are frequently utilized to recognize the photos of 
the afflicted leaves. In this study, various machine 
learning algorithms that are used to detect whether 
a plant is afflicted with a disease are described. 
These algorithms determine whether a plant has a 
disease. This was accomplished through a series of 


1. Introduction 


The growth and profitability of India’s agriculture 
sector are the primary drivers of the country’s econ- 
omy. The technique of identifying each leaf dis- 
ease in agricultural applications is the most difficult. 
Image processing techniques are becoming increas- 
ingly popular as a beneficial tool for raising agricul- 
tural efficiency, and farming practices, boosting pro- 
cedure precision and quality whilst simultaneously 
minimizing the amount of human monitoring per- 
formed by farmers. 


Studies on leaf-based classification and the detec- 


tion of disease leaf images are also vital for identify- 
ing plant diseases. Detecting edge bases and effec- 
tively classifying data can be challenging because of 
noise and classification errors. To get better clas- 
sification performance, we will first need to con- 
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phases, including image acquisition, feature extrac- 
tion, sickness classification, and the results dis- 
played (Varshney et al.). 


This work determines the best machine learn- 
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Principal Component Analysis 


ing classifier for partitioning the sick region using 
canny edge and contour detection. Subsequent fea- 
ture extraction using PCA is applied, then the data 
is categorized. The rest of this study is organized as 
2 Literature Review, 3Proposed Model, 4 Result and 
Discussion,5 Conclusion, and Future Scope. 


2. Literature Review 


Tulshan, A. S., et al. (2019) the collected data is 
preprocessed, segmented, and then feature extrac- 
tion is applied to the data and classified using K 
Nearest Neighbor (KNN). Predicting leaf diseases 
in plants using the proposed implementation has 
shown an accuracy of 98.56 percent. Further data on 
a leaf disease in a plant are also presented, including 
the Accuracy, Affected Area, Sensitivity, Disease 
Name, and Elapsed Time (Tulshan, Raul, et al.). 


Zamani, A. S., et al. (2022) describe a paradigm 
for identifying leaf sickness. This framework can 
take as input a picture of a leaf. To begin with, noise 
is removed from leaf pictures during preprocessing. 
To eliminate ambient noise, use the mean filter. The 
image quality is improved via histogram equaliza- 
tion. In photography, segmentation splits a single 
image into many parts or segments. It helps to define 
the limits of the image. The K-Means method is 
employed to segment the image. The principal com- 
ponent analysis is used to carry out feature extrac- 
tion. The next step is categorizing the photos using 
random forest, RBF-SVM, ID3, and SVM (Zamani 
et al.). 


Syed-Ab-Rahman et al. (2022) In order to detect 
plant diseases and classify citrus diseases from leaf 
photos, this work employs a dual-stage deep Classi- 
fier. The proposed model has two primary steps: (i) 
identifying unhealthy areas using a region proposal 
network, and (ii) assigning the most likely target 
area to the appropriate disease class via a classifier. 
In terms of detection, the proposed model achieves 
an accuracy of 94.37 percent, with an average pre- 
cision of 95.8 percent (Farhana Syed-Ab-Rahman, 
Hesamian, and Prasad). 


Prabu, M., et al. (2022) suggested a mango 
leaf disease classification structure. The frame- 
work comprises data preparation, feature selection, 
learning and classification, and performance evalua- 
tion. We chose 380 healthy and sick photos (Mango 
Anthracnose, Bacterial black spot, and Sooty mold). 
Data augmentation methods reduce overfitting and 
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improve generalization. Next, a crossover-based 
levy flight distribution convolutional neural net- 
work improves feature selection. The pre-trained 
MobileNetV2 model is employed for learning, and 
at the end, a support vector machine classifies dis- 
eases (Prabu and Chelliah). 

Kumar, V. V., et al. (2022) described, identi- 
fied, and quantified paddy plant crop diseases such 
as brown spots, bacterial blight, and leaf blasts. 
Risk analysis of paddy crop leaf images detects and 
recognizes. Deep convolutional neural networks 
(DCNNs) and fuzzy logic are used in our Deep Con- 
volutional Neuro-Fuzzy Method (DCNFM). Fuzzy 
logic and DCNNs help the synthesis extract critical 
features from unstructured input (V. V. Kumar et al.). 

Al-gaashani, M. S., et al.(2022) proposed trans- 
fer learning and feature concatenation are used to 
classify tomato leaf diseases. Kernel principal 
component analysis concatenates and reduces the 
dimensionality of MobileNetV2 and NASNetMo- 
bile’s pre-trained kernels (weights). They then inte- 
grate these features into a standard learning algo- 
rithm. Concatenated features improve classifier per- 
formance according to experiments. Multinomial 
logistic regression outperformed random forest, sup- 
port vector machine, and multinomial regression 
with an average accuracy of 97% (Al-gaashani et 
al.). 

Ali, S., Hassan, et al. (Ali et al.) This research 
presented potato disease detection using PCA-LDA 
classification and Feature Fusion (FF-PCA-LDA). 
RGB images yield bespoke hybrid and deep char- 
acteristics. TL-ResNet50 extracts deep features. 
Fused hybrid and deep features are handcrafted. 
After fusing picture features, PCA selects the most 
discriminant properties for LDA (Kruse et al. B. 
Sharma, V. K. Sharma, and S. ; M. Kumar Frai- 
wan et al.) model development and provides 98.20% 
accuracy. 

Caglayan, A. et al. (Caglayan, Guclu, and Can) of 
this paper, show how images of leaves can be used 
to identify plants. Classification algorithms like k- 
Nearest Neighbor, Support Vector Machines, Naive 
Bayes, and Random Forest [17] use the shape and 
color of leaf images to determine what kind of plant 
it is. The method shown here is tested on 1897 
images of leaves and 32 different kinds of leaves. 
The results showed that the Random Forest method 
could help people recognize plants up to 96% more 
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often when both shape and color features are used. 

Gulhane, V. A. et al. (Gulhane and Kolekar) uses 
the Nearest Neighbor Classifier (NNC) and Prin- 
cipal Component Analysis (PCA) to diagnose ill- 
nesses on cotton leaves. The statistical information 
for the Green (G) channel of an RGB image can 
be examined once PCA/KNN multi-variable algo- 
rithms have been implemented. As diseases or ele- 
mental deficiencies are reflected accurately by the 
green channel, it is considered for reliable feature 
gathering. PCA/KNN-based classifiers have been 
observed to have a 95% accuracy rate in classifica- 
tion. 

Sujith, A. et al. (Sujith and Aji) The suggested 
approach provides an ideal feature set developed via 
features extraction methods using a Local Binary 
Pattern (LBP), Gray Level Co-Occurrence Matrix 
(GLCM) (Fraiwan et al.), and a Histogram of Ori- 
ented Gradients (HOG). Neighborhood Component 
Analysis optimizes the combined feature vector 
(NCA). The classification performance and compu- 
tational efficiency have been improved using feature 
selection, and dimensionality reduction approaches. 
The experiment’s proposed method has an average 
classification accuracy of 97.63% in 291.24 sec- 
onds of calculation time, using three plant datasets: 
Flavia, D-Leaf, and Swedish Leaves. 


3. Proposed Model 


This section describes the proposed model of the 
study. Here, we begin with dataset collection, 
which contains 100 healthy and 100 infected leaves. 
After data augmentation, we have 850 healthy and 
850 diseased leaves in our Cashew Crop Diseased 
Database (CCDDB). dataset. Data transformations 
such as scaling, rotation, and flipping are carried out 
in Pre-processing. 

Canny edge detection is carried out for seg- 
mentation. Image segmentation, feature extraction, 
and classification of diseased cashew leaves are all 
shown in Fig. | of the proposed model. 

Edge detection: A crucial task in pattern recog- 
nition is edge detection. Identifying borders across 
regions with various attributes, such as intensity or 
texture, can be characterized as it. 

Canny edge detection: The canny edge detector 
is a multi-stage technique for image edge detection. 
The 1986 publication ”A computational method to 
edge detection” by John F. Canny introduced it. 
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FIGURE 1. Proposed Model for diseased cashew 
leaf classification 


Finding edges in an image is, in a nutshell, the tech- 
nique of edge detection. An edge is often a sharp 
change in color from one image pixel to the next, 
like from black to white. It is one of the most com- 
mon ways to find edges. It is also popular because it 
gives good results. 

There are four stages involved in the Canny edge 
detection algorithm: 

e Noise reduction accomplished by blurring the 
image with a Gaussian function. 

e Doing calculations on the image’s intensity gra- 
dients. 

e The smoothing out of the edges. 

e Utilizing a method known as hysteresis thresh- 
olding 

First, we import our dataset, then convert it to 
grayscale, and last, we use the cv2.GaussianBlur fil- 
ter to blur the image and get rid of the noise. After 
that, we use the cv2.canny function to implement 
the Canny edge detector. This function calls for six 
parameters, three of which are necessary. In our 
instance, we made use of only the mandatory param- 
eters. The image containing the edges we wish to 
detect passed in as the first argument. The hystere- 
sis technique uses two thresholds; the second and 
third arguments are those thresholds. 

Figure 2(a) illustrates an example of a leaf image 
used as input for the proposed model. Before car- 
rying out the canny edge detection procedure, the 
input leaf needs to be transformed into a grayscale 
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A. Sample Input Leaf 
Image 


D. Contour Detection 
Applied 


C. Canny Edge 
Detection Image 


FIGURE 2. Experiment on Sample image 


image, as demonstrated in figure 2. (b). Fig. 2(c) 
displays the canny edge detection, and the contour 
detection applied to the original leaf is displayed in 
Fig. 2(d). We can see that the algorithm prioritized 
the edges it considered most significant. Experiment 
with varying the threshold values to observe how 
that affects the detection of edges. 

Contour Detection Technique: contour” refers to 
a curve connecting all points on a shape’s perime- 
ter. In binary images, contours can be recognized 
with great accuracy. As a result, each image must 
be converted to grayscale and then have a threshold 
set to it. cv2.findContours function takes three argu- 
ments: the source image, the contour retrieval mode, 
and the contour approximation technique. To deter- 
mine the shapes of the objects, we employed the 
binary image produced by the Canny edge detector. 
Hierarchy is kept in RETR TREE. The function’s 
output includes photos, contours, and hierarchy. All 
image contours are included in the output. 

Principal Component Analysis: PCA reduces the 
number of variables while keeping the underlying 
structure and patterns intact in high-dimensional 
datasets. The purpose is to identify and isolate the 
most salient aspects of the data, which will then 
be expressed as a collection of summary indexes 
known as principal components. By compressing 
the data into fewer dimensions that serve as feature 
summaries. 

Classifiers: This study has 4 different classi- 
fiers: Random Forest, SVM, KNN, and Naive 
Bayes. Support vector machines, sometimes known 
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as SVMs, are a group of supervised learning meth- 
ods that can be used for detecting outliers, as well as 
for classification and regression. Effective in envi- 
ronments with a high number of dimensions. Even 
when the number of dimensions exceeds the num- 
ber of samples, the method is still effective. The 
RF depends on many self-learning decision trees 
forming a ’’Forest.” Compared to a single DT, an 
ensemble of several decision trees (also known as 
an ensemble) can reach a sound and reliable conclu- 
sion more than a single DT alone. The K-Nearest 
Neighbors algorithm assumes that the new case or 
data is comparable to existing cases and places the 
new example into the category that is the most simi- 
lar to the categories that are already accessible. The 
K-NN algorithm remembers all accessible data and 
determines how to categorize a new point depending 
on its similarity to the stored data. This indicates 
that when fresh data becomes available, it may be 
quickly sorted into a well-suited category using the 
K-NN method. The Nave Bayes classifier is a super- 
vised machine learning algorithm. It also belongs to 
the family of generative learning algorithms, which 
models the input distribution of a particular class or 
category. 


4. Result and Discussion 


This section describes the result and discussion of 
this study. This study has 4 different classifiers: 
Random Forest, SVM, KNN, and Naive Bayes. Out 
of which Naive byes outperforms other classifiers. 

During Contour detection, the convergence crite- 
ria, which are utilized in the energy reduction pro- 
cedure, determine the accuracy of the results. A 
higher level of precision calls for more stringent 
convergence criteria, leading to longer computation 
durations, due to this reason, the classifier’s perfor- 
mance is not satisfactory. In decreasing the energy 
throughout their outlines, they frequently fail to 
notice minute details because of the importance of 
this goal. 

From Table 1, we can identify the accuracy of 
Naive Bayes is comparatively reasonable. 

The accuracy graph of four machine learning clas- 
sifiers is shown in Fig. 4, where Naive Bayes pro- 
vides the best results. 


5. Conclusion and Future Scope 


This study investigates which machine learning clas- 
sifier performs well when the canny edge detection 
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KNN NaiveBayes 


FIGURE 3. Accuracy Graph of Different Machine Learning Classifiers 


TABLE 1. Metrics of different Machine Learning Classifier 


RandomForest SVM KNN_ Naive Bayes 
Accuracy % 49.12 49.12 49.12 50.88 
Precision% 100.0 24.13 24.13 25.89 
Recall % 49.12 49.12 49.12 50.88 


A. Random Forest 


- _ | 


D. Naive Bayaes 


FIGURE 4. Confusion Matrix of Different 
Machine Learning Classifiers 


coupled with the contour detection algorithm is per- 
formed to partition the diseased region. After that, 
the PCA method is used for feature extraction, and 
the data is then classified. When performing canny 
edge detection on images, they become less dis- 
tinct due to the gaussian smoothing, which has the 
same effect on the edges. When there is a signifi- 
cant difference in brightness level between the fore- 


ground items and the image’s background, the con- 
tour detection technique performs remarkably well. 
It frequently becomes gets trapped in a local mini- 
mum. When working with images of a large size, 
this method performs more slowly. In this study, 
Naive Bayes performs better than other classifiers. 
In the future, we will establish an effective segmen- 
tation and feature extraction method to classify dis- 
eased cashew leaf images. 
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